Practical Uncertainty Minimization for Spectral Macrostate Data Clustering

نویسندگان

  • Brian White
  • David Shalloway
چکیده

Spectral clustering, which uses the global information embedded in eigenvectors of an inter-item relation matrix, can outperform traditional approaches such as k-means and hierarchical clustering. Spectral hierarchical bipartitioning is well-understood, but spectral multipartitioning remains an interesting research topic. Korenblum and Shalloway [Phys. Rev. E 67, 056704 (2003)] used an analogy to the dynamic coarse-graining of a stochastic system and the principle of cluster uncertainty minimization to motivate a fuzzy spectral multipartitioning method, macrostate data clustering (MDC), that could solve problems that defeated other methods. However, MDC poses a challenging non-convex global optimization problem that was solved by a brute-force technique unlikely to scale to problem sizes beyond O(10). Here we provide further tests of the accuracy of MDC and develop a new method for solving the optimization problem, which scales to data sets at least two orders-ofmagnitude larger. This range includes problems of significant biological interest, such as microarray analysis of gene expression data. Moreover, we show that the method of Weber et al. [Tech. Rep. 04-39, Konrad-Zuse-Zentrum für Informationstechnik Berlin (2004)] provides a zeroth-order solution to the minimum uncertainty problem and provide a new geometric interpretation of the solution. We show how this approximation can be naturally extended to an exact solution and how the conditions for its validity can be extended past those previously proposed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient uncertainty minimization for spectral macrostate data clustering

Spectral clustering, which uses the global information embedded in eigenvectors of an interitem relationship matrix, can outperform traditional approaches such as k-means and hierarchical clustering. Spectral hierarchical bipartitioning is well-understood, but spectral multipartitioning remains an interesting research topic. Korenblum and Shalloway [Phys. Rev. E 67, 056704 (2003)] used an analo...

متن کامل

Macrostate data clustering.

We develop an effective nonhierarchical data clustering method using an analogy to the dynamic coarse graining of a stochastic system. Analyzing the eigensystem of an interitem transition matrix identifies fuzzy clusters corresponding to the metastable macroscopic states (macrostates) of a diffusive system. A "minimum uncertainty criterion" determines the linear transformation from eigenvectors...

متن کامل

LogDet Rank Minimization with Application to Subspace Clustering

Low-rank matrix is desired in many machine learning and computer vision problems. Most of the recent studies use the nuclear norm as a convex surrogate of the rank operator. However, all singular values are simply added together by the nuclear norm, and thus the rank may not be well approximated in practical problems. In this paper, we propose using a log-determinant (LogDet) function as a smoo...

متن کامل

Constructing Uncertainty Sets for Robust Linear Optimization

In this paper, we propose a methodology for constructing uncertainty sets within the framework of robust optimization for linear optimization problems with uncertain parameters. Our approach relies on decision-maker risk preferences. Specifically, we utilize the theory of coherent risk measures initiated by Artzner et al. [3], and show that such risk measures, in conjunction with the support of...

متن کامل

Practical challenges that arise when clustering the web using spectral methods

This is a report on an implementation of a spectral clustering algorithm for classifying very large internet sites, with special emphasis on the practical problems encountered in developing such a data mining system. Remarkably some of these technical difficulties are due to fundamental issues pertaining to the mathematics involved, and are not treated properly in the literature. Others are inh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008